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ABSTRACT 

Readability estimates are usually based on measures 
cf word difficulty and measures of sentence difficulty. Word 
difficulty is measured in two ways: by the structural size and 
complexity of words or by reference to phonomena of language use, 
such as word-list frequency or the regularity of spelling patterns. 
Sentence difficulty is measured only in terms of size or complexity, 
despite models of the reading process which suggest that readers 
could use a knowledge of recurring syntactic patterns to economize 
their scan of the text. This study identified syntactic patterns 
which occurred in samples of student writing and text and counted how 
often each pattern recurred. McCall-Crabbs test passages were 
classified for the commonness of their sentence patterns according to 
these counts. The commonness of the patterns was significantly 
associated with the readability of the passages, with other factors 
controlled. The results bear implications for measuring readability 
and designing instruction. (Author) 
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Readability estimates are usually based on measures of word 
difficulty and measures of sentence difficulty. Word difficulty 
is measured in two ways: by the structural size and complexity 



word-list freque.ncy or the regularity of spelling patterns. 
Sentence difficulty is measured only in terms of size or com- 
plexity, despite models of the reading process which suggest that 
readers could use a knowledge of recurring syntactic patterns to 
economize their scan of the text. This study identified syntactic 
patterns which occurred in samples of student writing and text, 
and counted how often each pattern recurred. McCall-Crabbs test 
passages were classified for the commonness of their sentence 
patterns according to these counts. The coromonness of the patterns 
was significantly associated with the readability of the passages. 



of words or by reference to phenomena of language use such as 




The results bear implications for 



measuring readability and designing instrq^tion • 
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"Development of a Frequency-based Measure of 
Syntactic Difficulty for Estimating Readability" 

Ramsay Selden 
Department of Research Methodology 
University of Virginia 

Purpose 

The purpose of this study was to improve the methods used to 
measure the syntactic difficulty, of text as a factor which- affects 
readability. Specifically, the study was conducted in order to 
determine whether the syntactic patterns which are used in sen- 
tences are repeated with varying frequencies in the writing of 
school children or in typical textbook prose, and whether the 
frequency of this recurrence is associated with the readability 
of the patterns when they appear in text. If this were true, it 
would suggest whether readers use a sense of expectation of com- 
mon sentence patterns to guide and economize their search for 
textual features which give cues to the meaning of the text. 
Further, if this variable were found to be useful, it would pro- 
vide a basis for selecting the instructional materials presented 
to readers according to syntactic difficulty, and. it could streng- 
then the basis for sequencing syntactically-based comprehension 
skills in an instructional program. 

Conceptual Rationale 

Readability formulae generally include two types of variable — 
measures of word difficulty and measures of sentence difficulty. 
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Lively and Pressey (1923) developed a readability estimation 
technique which was based on the placement of the words con- 
tained in a piece of text in Thorndike ' s frequency-stratified 
word-list. Later, Vogel and Washburne (1928) , Gray and Leary * 
(1935) and others developed techniques for estimating readability 
which included a measure of word familiarity based on a fre- 
quency-stratified word-list and a measure of sentence difficulty 
based on structural complexity or mean length • V/idely used 
formulae such as those developed by Dale and Chall ^1948), 
Spache (1953), and Dolch (19A8) are based on measures of sentence 
length and on an index of the familiarity of the words determined 
by looking them up in a frequency-stratified word-list. Recent 
formulae developed by Fry (1968) and by Harris and Jacobson .(1974) 
include different measures. The Fry method is based on mean 
sentence length and mean word length in syllables. The Harris- 
Jacobson method includes vocabulary familiarity, word length, 
sentence length, and the presence of words with irregular or 
uncommon sound-letter correspondences. 

" The variables which are generally used to jneasure word 
difficulty are of two types. The first type is based on the premise 
that the structural size and complexity of the word affects the 
difficulty with which it can be read; this type of measure is 
exemplified by measures of mean word length in syllables or letters, 
and by measures of the mean number of affixes present on words in 
the text. The second type of variable is based on the premise that 
phenomena of usage determine the familiarity of whole words, or 
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determine the ease with which a word can be decoded using sound- 
letter correspondances; this type is exemplified by word-list 
based measures of word familiarity, and by the Harris- Jacobson 
measure of sound-letter regularity. 

While readability formulae include both types of word- 
difficulty variable, they include only measures of sentence 
difficulty which are based on the premise that structural size 
and complexity affect the readability of sentences. The frequency 
or regularity of sentence-structure phenomena has not been sys- 
tematically identified and used as a factor affecting the readability 
of text. Bormuth (1969) studied a large number of sentence- 
structure characteristics in relation to readability, but was not 
able to conclude what general factors in sentence construction 
seemed to affect readability. Similarly , Granowski (1971) and 

Botel and Granowski (1972) report development of an index of 

... i 

sentence difficulty which includes consideration of structure 
familiarity, but the eclectic nature of the index precludes de- 
termination of the usefulness of this variable. 

Models of the reading process suggest that readers can use 
expectations of sentence patterns to guide their search for cues to 
meaning. Smith (1971) posits a model of the reading process in 
which the reader uses an active, anticipatory approach to the text, 
along with the redundancy in the textual code, to derive meaning, 
from a minimal number of visual cues. Gibson and Levin (1975) see 
the reader using context, redundancy, and prior experience with the 
language in order to derive meaning economically and efficiently. 
Goodman's (1968) "guessing game" model portrays the reader as using 
available prior information to hypothesize the meaning of the 



upcoming text, providing a basis for scanning the text for cues. 
These models support the notion that if sentence patterns occur 
in varying frequencies, the presence of more common or familiar 
patterns would facilitate the ease with which the text could be 
read. Strickland (1962) developed a basis for measuring 
sentence-pattern familiarity in terms of children's oral language. 
Bormuth (1964) found a non-significant association between this 
variable and passage cloze scores, but did find their relation- 
ship to be curvilinear. 

The present study was conducted in order to identify which 
patterns recur in student writing and in widely-used text, and 
to determine whether patterns which recur more often, if present, 
are associated with lower reading difficulty. 

Methods 

The study was conducted in-two phases. In the first phase, 
samples of student writing and textbook prose were scanned to 
identify those syntactic patterns which occurred and to count how 
often each recurred. To do this, the words on the Basic 
Elementary Reading Vocabularies (Harris and Jacobson, 1972) were 
classified according to their part-of-speech function using the 
scheme developed by Fries (1952). Next, ten samplos of writing 
and text, one each for writing and text in grades 2, 4, 6, 9, 
and 12, were divided into t-units (Hunt, 1964) in order to provide 
standard sentence-marking throughout the samples. These t-units 
were scanned using computer processing procedures developed at 
the University of Virginia; the procedures scanned each t-unit, 
looked up each word in the classified vocabulary, assigned a coded 



character for each word to ir^dicate its syntactic classification, 
and catalogued a string of these codes for the words in the 
t-urilt. A series of FORTRAN programs was used to count how often 
the strings occurred in each of the ten samples of prose. The 
final product:, of this phrase was a master listing of syntactic 
patterns and their recurrence in the ten text and writing samples. 

In the second phase, this master listing was used 'to provide 
an index of the familiarity of sentence patterns in a set of 161 
passages from the McCall-Crabbs Standard Test Lessons in Re ading 
(McCall and Crabbs, 1961) . The passages were marked off into 
t-units, and the t-units were converted into coded syntactic 
strings as was done for the text samples. Each string was located 
in the master listing to find its recurrence in the ten text 
samples. These ten values were averaged for the strings in the 
passage to provide ten valuer for the passages as a whole. These 
ten indices wore later analyzed in relation to the readability of 
the passage. Three counts were made of the number of t-units in 
each passage which were one to ten words in length, eleven to 
twenty words in length, and twenty-one to thirty words in length. 
This provided a rich measure of t--unit length which could be 
controlled in the analysis. Finally, the criterion measures of 
passage comprehensibility were recorded. The McCall-Crabbs 
seventy percent criterion (MC70) was used; this is the ir^oan grade 
equivalent of pupils who answered seventy percent of the compre- 
hension questions on the passage correctly in McCall-Crabbs norming 
studies. 
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Analysis 

Multiple regression analyses were calculated using the McCall 
Crabbs seventy percent criterion c's the dependent variable. 

In the first analysis, the restricted equation was based on 
estimates of readability provided by the Harris-Jacobson formula 
(based on sentence length, word length, word-list frequency, and 
presence of sound-letter irregularities) and the three counts of 
t-unit length. The full model added the ten indices of pattern 
coramonneGS. The following multiple R values were obtained: 

TABLE 1 



Variables Multiple R R^ F-ratio df P 

Pattern length counts .62169 .38650jitei^§^q;,p^--^ 
H-J readability 
estiiaates 

Pattern commonness .65055 .42321 1.040 9,147 > .05 

variables 



The table shows that the pattern commonness variables increase 
the overall strength of the estimation somewhat, but not to a 
significant degree. 

Since Bormuth's (1964) study had indicated a curvilinear 
association, the pattern-commonness variables \<ere squared and 
the analysis was made again: 
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TAPLE 2 



Variables Multiple R F-ratio df 



Pattern length counts 
H-J readability 

estimates .62169 .38650 

Pattern commonness 

variables and ^ nc 

squares .67664 .45784 2.15 9,147 < .0^ 



With the use of squared terms to account for a curvilinear 
relationship between passage difficulty and sentence-pattern 
familiarity., a significant increment in the estimation is obtaineci 

Discussion 

The association between the commonness of sentence patterns 
in a passage of text and the readability of the passage was found 
to be significant and curvilinear. S\±>sequent. analysis of the 
data have been performed, using as the criterion the comprehensibnity 
of subsets of the passages to students in grades 4, 6, 9, and ^ • 
These analyses suggest that the usefulness of sentence-pattern 
familiarity is strongest for older readers. Apparently, use of 
these frequency-based expectancies depend on the level of diff^^^lty 
of the passage, and on the ability of the reader. 

Although very preliminary in na^ture, it can be concluded that 
frequency-based expectancies of syntax are useful to maturing 
readers in certain textual situations. 
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ConclualonB 

This suggests that ostimatos of readability used to construct 
a soquencQ of materials which are graded in difficulty should 
account for the commonness of syntactic patterns. It also suggests 
that the use of pattern familiarity, or the ability to contend 
with unfamiliar patterns, are reading skills which should be 
developed in instructional programs. This could be accomplished 
through the exposure of learning readers to a large, diverse amount 
of natural text to give them a basis for syntactic expectancies. 

Further , /Cjames or exercises which work with these expectancies to 

^ ? 

predict upcpming text seem to be supported. 
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